Automation of gene function prediction through modeling human curators' decisions in GO phylogenetic annotation project
نویسندگان
چکیده
The Gene Ontology Consortium launched the GO-PAINT project (Phylogenetic Annotation and INference Tool) 9 years ago and is currently being used in the GO Reference Genome Annotation Project to support inference of GO function terms (molecular function, cellular component and biological process) by homology. PAINT uses a phylogenetic model to infer gene function by homology, a process that requires manual curation of experienced biocurators. Tremendous amount of time and efforts have been spent on the GO-PAINT project yielding more than 4000 fully annotated phylogenetic families with more than 170,000 annotations. These preliminary data have thus enabled potential algorithmic representation and automatic solvation of the additional 9000-unannoated phylogenetic families. Here we present an automated pipeline for phylogenetic annotation and inference, which simulates the standard annotation procedures of curators and models the curators’ decisions during the manual curation process. The pipeline has been built into the newest version of PAINT software available at http://www.pantherdb.org/downloads/index.jsp. The standalone automation pipeline and datasets are available at https://github.com/haimingt/GO-PAINT-automation
منابع مشابه
Large-scale inference of gene function through phylogenetic annotation of Gene Ontology terms: case study of the apoptosis and autophagy cellular processes
We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explici...
متن کاملPhylogenetic-based propagation of functional annotations within the Gene Ontology consortium
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be bas...
متن کاملMolecular and Phylogenetic Analysis and Protein Structural modeling of NS Gene of Human Influenza A Virus Subtype H1N1 Circulating in Iran 2015 & 2017
Abstract Background: The NS (non-structural) genomic segment of influenza A virus expresses two proteins (NS1 and NS2) which are responsible for the virulence and pathogenicity of virus. In this study we investigate the characterization and variability of the NS gene recovered from H1N1 influenza viruses isolated from Iranian patients during the 2017 seasonal outbreak and from high...
متن کاملThe Gene Ontology Task at BioCreative IV
Gene Ontology (GO) annotation is a common task among model organism database (MOD) groups. It is a very time-consuming and labor-intensive task, thus often considered as one of the bottlenecks in literature curation. There is a growing need for semior fully-automated GO curation techniques that will help database curators rapidly and accurately identify gene function information in full-length ...
متن کاملGene ontology annotation by density and gravitation models.
Gene Ontology (GO) is developed to provide standard vocabularies of gene products in different databases. The process of annotating GO terms to genes requires curators to read through lengthy articles. Methods for speeding up or automating the annotation process are thus of great importance. We propose a GO annotation approach using full-text biomedical documents for directing more relevant pap...
متن کامل